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VIDEO SEGMENTATION USING both incorporated by reference in their entireties herein, fei 

STATISTICAL PIXEL MODELING an efficient transmission or storage setae* the scene model 

need be transmitted only once, while the foreground infor* 

CROSS-REFERENCE TO RELATED mauon is transmitted for each frame. For example, in the 

APPLICATION 5 case of an observer (i.e., camera or the like, which is the 

source of the video) that undergoes only pan, tilt, roll, and 

litis application is a contimiatk>n-bvpart of US. appli- zoom 0 f motion, the scene model need be transmitted 

cation Ser. No, 09/^15,385 now Pat. No. $,625,310, &Ted on ot & y moc because the appearance of the scene model does 

Mar. 23, 2001, commonly-assigned, and incorporated raff ein aot change fo m fome to frame, except in a weU-defir^d 

by reference in its entirety. to ^ ^ ^ 0 | >server njotkm, which can be easily 

mi?! n nr? top tnvkntion acccwnted^fjytran^tling ration parameters. Note that 

FibLD w irifc invkn iiUN such techniques are also applicable in the case of otto forms 

a part of or in conjunction with Closed Circuit Television ^ ob ^ of, ^TJ^^itSlh 2 
Systems (CCTV) that are utilized in security, surveillance »»y be undergoing apparent moton due to pan. tilt ami 
and related homeland security and ant i- terrorism systems, zoom motion of the camera. 
IVS systems that jnocess surveillance video m retail estab- » To make automatic object-oriented video processing fea- 
tishments for the purposes of establishing id-store human sible, it is necessary to be aWe to distinguish the regions in 
behavior trends for market research purposes, IVS systems the video sequence mat are moving or changing and to 
that monitor vehicular traffic to detect wrong- way traffic, separate (Le., segment) them from the stationary background 
broken-down vehicles, accidents and road blockages, and regions. This segmentation must be performed in the pres- 
video compression systems. IVS systems are systems Oat 23 m of apparent motion, for example, as would be induced 
further process video after video segmentation steps to by a parming, tilting, rolHng, and/or zooming observer (or 
perform object classification in which foreground objects ^ to oato nation-related phenomena, including actual 
may be classified as a geceral class such as animal, vehicle, observer motion), lb account for this motion, images are 

° r ^t'^^^f^^^nf M first aligned; that is. oorrespondmg locations in theimages 

m more specific classes as human, small- or large-non- 30 - f^^im^eraasisd, as discussed above. After tins 

human animal, automobile, aircraft, boat, truck, tree, flag, or Ik^** th.t m milv moving or changing, rtfe- 

watermgion. la IVS systems, once such video ^mentation tiwto statfooary background, can be segmented fiom the 

anddassifteaucmct^itoadetec^ stationary objects in the scene. The stationary regions are 

to determine how their positions, movements and behaviors ^ ^ to create (or to update) the scene model, and toe 

rdate to user defined virtual video tripwires, and virtual ss ttm^ tk^mOaMBi for each frame, 

regions of interest (where a region of interest may be an 7* . ... „ t „ iAmtjft , Dml », ltnm .«,»iiv 

entire field of view, or scene), ^defined events that occur * » ^ Sjjffi-S SSfeS 

will then be flagged as events of interest that will be ^^J^S^iX^^ S£ 

communicatedtolese^tyofficermprofessionalon^ gmund and stanonary ^^APf^^cp^r 

Examples of such evenu delude a human « a vS «o ^J^Z^^S £j2ffS?«S 

crossing a virtual video tripwire, a person or vehicle loner- 10 Pr°™f ™JZ^f MhTJL omc^inc 

ing ordering a virtual region of interest or scene, or an ™^ ^l^^!^f^Sd^S? S 

object being left behind or Sen away from a virtual region fta>W * H^^^^^SXZS. 

or^faparticular. the presentmvkton deals wimwa J* ? f ^^t^s^T^o « 

ofsego^tu^ 45 ^^SSSSZSS^S^^ 

^^p^p^oft^nscon^mv^&m^, S^tSS utilize and inZmte for 

BACKGROUND OF THE INVENTION small foreground objects and tov ?f^J^S£ 

cessing power and memory. It would, therefore, be desirable 

In o#«t-based video conipresskm, video segmattation so fJtS^L^JS&^S'S 

for detecting and tracking video objects, asweUasin other Aground and bac*grouffll «ff 

^T^Zte^deoWS foe Wut video is <*?P «l««atett«»» £ «* Aground 

SteMoSSttetato- without the lumtations of pnwtechmques. 



foe other stream <x>m*to 53 SUMMARY Or iHfciNVi^iiuiN 
ing portions of the video, to be denoted as foreground 

mformatkm. The background information is represented as The present invention is directed to a method tor > teg* 

a background model, including a scene model, i.e., a com- mentation of video into foreground information and back* 

posite image composed from a series of related images, as, ground information, based on statistical properties of the 

for example, one would find in a sequence of video frames; *o source video. More particularly, the method is based on 

the background model may also contain addrrional models creating and updating statistical raformafcort pertammg to a 

and modeling mforrnation. Seme models are generated by characteristic of regions of the video and the labeling of 

aUgning images (for example, by matching points and/or those regions (Le. t as foreground or background) based on 

regions) and&termm^ the statistical mibrmalica For example, m oi^ embodttneiu; 

scene models is discussed in further depth in commonly- *s the regions are pixels, and the characteristic is chromatic 

assigned U.S. patent application Set No. 09/472,162, filed intensify. Many other possibles exist, as will become 

Dec. 27, 1999, and Scr No, 09/609,919, filed Jul 3, 2000, apparent. In more particular embodiments, the mvenhon is 



Page 3 of 10 

US 7,224,852 B2 

3 4 

directed to methods of using the inventive video segmcnta- "Software" refers to prescribed rules to operate a com- 
lioii methods to implement intelligent video surveillance puter. Examples of software include: software; code seg- 

syslems. mcnts; instructions, computer programs; and programmed 

In embodiments of the invention, a background model is logic, 
developed containing at least two components. A first com- * A -computer system " refers to a system having a com- 
ponent is the sew model, which may be built and updated, ^ when} ^ colter comprises a computer-readable 
for example, as discussed in the aforemenUcmed US. patent raedium embodying software to operate the computer, 
applications. A second component is a background siatisUcal . ? „ * ^ , 
model, A "network" refers to a number of computers and asso- 

In a first embodiment the inventive method comprises a to elated devices mat are connected by communication fccili- 
two-pass process of video segmentation Toe two passes of fo - A network involves permanent connections such as 

the embodiment comprise a first pass in which a background cables or temporary omnections such as those made through 

statistical model is built and updated and a second pass in telephone or other communication links. Examples of a 

which regions in me frames are segmented. An embodiment network include: an internet* such as the Internet; an intra- 

of the first pass comprises steps of aligning each video frame ™ net > a ^1 area network (LAN); a wide area network 

with a scene model and updating the background statistical (WAN); and 9 combination of networks, such as an internet 
model based on the aligned frame data. An embodiment of attd an intrai3et - 

the second pass comprises, for each frame, steps of labeling "Video" refers to motion pictures represented in analog 

regions of the frame and performing spatial Altering. and/or digital form. Examples of video include video feeds 

m a second embodiment, the inventive method comprises 20 from CCTV systems in security, surveillance and anu- 

a one-pass process of video segmentation. The single pass terrorism applications, television, movies, image sequences 

comprises, for each frame in a frame sequence of a video from a camera or other observer, and computer- generated 

stream, steps of aligning the frame with a scene model; image sequences. These can be obtained from, for example* 

building a background statistical model; labeling the regions a wired or wireless live feed, a storage device, a firewire 

of the frame, and performing spanalAemporal filtering. 25 interface, a video digitizer, a video streaming server, device 

In yet another embodiment, the inventive method com- or software component, a computer graphics engine, or a 

prises a modified version of the aforementioned one-pass network connection. 

process of video segmentation. This embodiment is similar "Video processing" refers to airy manipulation of video, 

to the previous entfwdiment, except that the step of building including, for example, compression and editing. 

a background statistical model is replaced with a step of 30 A "frame" refers to a particular image or other discrete 

building a background statistical model and a secondary unit within a video. 

statistical model. 

Each of these embodiments may be embodied in the BRIEF DESCRIPTION OF THE DRAWINGS 
forms of a computer system running software executing 

their steps and a computer-readable medium wntaining m invention will now be described in further detail in 

software representing their steps. connection with the attached drawings, in which: 

DEFINITIONS FIG. 1 shows a flowchart corresponding to an implemen- 

1 ' u 4 1 1 tation of a first embodiment of the invention; 

m describing the invention, the Mowing definitions are 40 JSJ^ 

applicable throughout (including above) alternative embodiments of the labeling step in the flowchart 

A "computer* 1 refers to any apparatus that is capable of 0f J??' \ . - . _ ^< , 
accepting a structured input, processing the structured input nGS *» ™ 3h show ^wchai^correspondmg to imple- 
according to prescribed rules, and producing results of the 45 mentanons of the spanalAemporal filtering step in the flow- 
processing as output. Examples of a computer include; a chart of F1G * 1; 

computer, a general purpose computer, a supercomputer, a FIG. 4 shows a flowchart corresponding to an impiemen- 

mainframe; a super mini-computer, a mini-computer, a tation of a second embodiment of the invention; 

workstation; a micro-computer, a server; an interactive FIG, S shows a flowchart corresponding to an impiemen- 

television; a hybrid combination of a computer and an so tation of one of the steps in the flowchart of FIG. 4; 

interactive television; and application-specific hardware to figs. 6a and 6b together show a flowchart corresponding 

emulate a computer and/or software. A computer can have a to an implementation of another one of the steps in the 

single processor or muiupte processors, which can operate flowchart of FIG 4- 

inTi^f 0r T m ^?J^ C ^ P ° UX ^^i 0 FIG. 7 shows a flowchart correspc>ndmg to an imtfemen- 

Zll^ oonneeted togetto via a network for « tation of a third embodiment of ^invention; 

transmitting or receiving information between the cornput- « , „ 

era. An example of such a computer includes a distruXi FIGS ' ** and * flowchart conesr*rKhng 

computer system fbr processing information via computers *L*° im P^entauon of one of the steps m the flowchart of 

linked by a network. 7; 

. A "computer-readable medium" refers to any storage 60 FIG- * depicts an embodimenl of the invention in the form 

device used for storing data accessible by a computer of software embodied on a computer-readable medium, , 

Examples of a computer-readable medium include: a mag- which *** te of a »y* tem ; m< * 

netic hard disk; a floppy disk; an optical disk, like a FIG. 10 depicts a flowchart of a method of implementing 

CD-ROM or a DVD; a magnetic tape; a memory chip; and m intelligent video surveillance system according to an 

a carrier wave used to carry computer-readable electronic 65 embodiment of the invention. 

data, such as those used in transmitting and receiving e-mail Note that identical objects are labeled with the same 

or in accessing a network. reference numerals in all of the drawings that contain them. 
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DETAILED DESCRIPTION OF THE corresponding to the given pixel (or region). In the present 

INVISN'ilON setting, then, such a mean would be computed for eacb pixel 

or region. 

As discussed above, the present invention is directed to WW* E 3 n - (0 gives the general fonnula for a sample 

the segmentation of video streams into foreground infonna- 5 ™» ™» «* optimal to 

tion, which corresponds to moving objects, ^nd background v *» a ^^L P !f 
information, whicTconesponds to the stationary portions of 

the video. Hie present invention may be embodied in a ^f^^^^ LS^S 

number of ways, of which three specific ones are discussed m J£^*S"^^ 

below, These embodiments are Want to be exemplary, 10 ^ ortoto a^th^ ty^ *t™£^^ 

rather than exclusive. **»J W !&K 

^ . „ . . tt > weighted more heavily man the present value, hi particular, 

The ensuing discussion refers to "pixels" and "chromatic me foUowmft eouation may be used: 

intensity;** however, the inventive method is not so limited- 

Rather, the processing may involve any type of region |5 ?v»*p*M+ita» <2) 
{including regions comprising multiple pixels), not just a 

pixel, and may use any type of characteristic measured with where W p is the weight of the past values and W n is the 

respect to or related to such a region, not just chromatic weight assigned to the newest value. Additionally, v i*pre- 

intensity. seats the weighted average taken over X samples, and x* 

. _ _ . 2o represents the K'* sample. W p and may be set to any pair 

1 . First Embodiment — TVo-Pass Segmentation of values between zero and one such that their sum is one 

The first embodiment of the invention is depicted in FIG. and such thai W n <W p> so as to guarantee mat the past values 

I and corresponds to a two-pass method of segmentation. As are more heavily weighted man the newest value. As an 

shown in FIG. 1, the method begins by obtaining a dame (or example, the inventors have successfully used W/0.9 and 

video) sequence from a video stream <Step I). The frame W^.L 

sequence preferably includes two or more frames of the Standard deviation, c f is determined as the square root of 

video stream. The dame sequence can be, for example, a the variance, or 2 , of the values under consideration. In 

portion of the video stream or the entire video stream. As a general variance is determined by the following formula: 
portion of the video stream, the frame sequence can be, for 
example, one continuous sequence of frames of the video 30 

stream or two or more discontinuous sequences of frames of _ 

the video stream. As part of the alignment step, the scene where x 2 represents the average of x 2 ; thus, the standard 

model is also built and updated. deviation is given by 

After Step 1, m Step 2, hMsdeteimmedwheto rr ^ (4) 

frames have yet been processed. If not, the next frame is w H 

taken and aligned with the undertying scene model of the Because the inventive method uses running statistics, mis 
video stream (Step 3% such alignment is discussed above, 
and more detailed discussions of alignment techniques may 



be found, for example, in commonly-assigned U.S. patent oW<**V<S^ (*») 

application Ser No. 09/472,162, mod Dec 27, 1999, and 40 

2S2£^^ 19 '22l M d^°° t ^ kt Vf^ A ** where v is as defined in Bqu. (2) above, and is 
^n^^JST aS o^ned S the weighted aveof thesquanS valui of the 

wett as in numewusother inferences. samples, through the N* sample, and is given by 

The inventive method is based on the use of statistical _ _ 

modeimg wdetermmewh^ 45 {« , )*t1M^>*m+»'a* <5) 

classified as being a foreground object or a part thereof or as 4 _ 

being the background or a part thereof. Step 4 deals with the As in the case of the weighted average of the sample values, 
building and updating of a statistical model of the back* the weights are used to assure that past values are more 
ground, using each frame aligned in Step 3. heavily weighted than me present value. 

Ite statistical model of the present invention comprises » tfven ttts. Step 4 works to create and update the statis- 
first- and second-order statistics, fa the eiisumg discussion, modsl cmpuung the value of Eqn, <4a) for each 

mean and standard deviation will be used as such first- and for f <* ^ame. faStep 4, the values for fee Pixels are 

second-order statistics; however, this is meant to be merely also stored on a inxd-by-mxcl basis(as opposed to how they 
exemplary of the statistics that may be used m «* oa • fiamc-by^me basis); that is, an 

r« *<m«*.t ifc*™*™ ft r M Mm _i*« s ; B u ^ mA u ( 55 array of values is compiled for each pixel over the sequence 

,jL g *?f^ * of frames. Note that hi an alternative embodiment, Step 4 

taking the sum of the samples and dividing it by N, Le., m{y patogm ^ &toragc of valucs 

Following Step 4, the method returns to Step 2 to check 
H m whether or not all oftheframes have been processed If they 

ft 60 have, then the method proceeds to Step 5, which commences 

yo & the second pass of the embodiment. 

N * m Step 5, the statistical background model is finalized. 

This is done by using the stored values for each pixel and 
determining their mode, the mode being the value that 
where x< is a particular sample corresponding to a given 65 occurs most often. This may be accomplished, for example, 
pixel (or region), which in the present case could be, for by taking a histogram of the stored values and selecting the 
example, the measured chromatic intensity of the i* sample value for which the histogram has the highest value. The 
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mode of each pixel is then assigned as ihe value of the Returning to FIG. 1, once all of the pixels of a frame have 

background statistical mode! for that pixel. been labeled, the process proceeds to Step 8, in which 

Following Step 5, the method proceeds to Step 6, which spatial/temporal filtering is performed While shown as a 

determines whether or not ail of the frames have been sequential step in FIG. 1, Step 8 may alternatively be 

processed yet. If not, then the method proceeds to Step 7, in 5 performed in parallel with Step 7. Details of Step 8 are 

which each pixel in the frame is labeled as being a fore- shown in the flowcharts of FIGS. 3a and 3b 

ground (FG) pixel or a background (BG) pixel. Two alter- to pjQ step 8 commences with a test as to whether 

native embodiments of the workings of this step are shown or mi a uthe pixels of the frame have been processed (Step 

in the flowcharts of FIGS. 2a and 2b 81 j If mXy m Stcp g5 t algorithm selects the next pixel, 

FIG. 2a depicts a two decision level method. In FIG. 2a % io ^ for processing and proceeds to Step 82, where it is 

the pixel labeling Step 7 begins with Step 71, where it is determined whether or not the pixe] is labeled as BG. If it is, 

determined whether or not all of the pixels in the frame have then the process goes back to Step 81 . If not, then the pixel 

been processed. If not, then the method proceeds to Step 72 undergoes further processing in Steps 83 and 84. 

to examine the next pixel . Step 72 deterniiiKS whether or not $ „ neighborhood filtering, is used to correct for 

the pixel matches the background statistical model, i.e„ is m ^ whcn ^ m aligned. If the current 

whether the value of the pixel matches the mode for that mweissliahtry misaligned with the growing background 

pixel. This is performed by taking the absolute difference Mfakai model, then, particularly near strong edges, the 

between the pixel value and the value of the background mcativt segmentation procedure, using the background 

statistical model for the pixel (Le., (he mode) and comparing statistical wi u label pixels as foreground Neighbor- 

it with a threshold; &at is, *> filtering will correct for this. An embodiment of Step 

A-fc^r^j <6) 83 is depicted in the flowchart of FIG. 3*. 

In FIG. 3b> Step 83 begins with Step 831. where a 

is spared with a threshold fcfo^ determination is made of the scene model location, P m , 

value of the pixel, while m^, represents the value of the corresponding to Next, a neighborhood, comprising the 

statistical background model for that pixel. p | xc i s> p mt surrounding ? m in the scene model, is selected 

The threshold 6 may be determined in many ways. For (Step 832). Step 833 next determines if all of the pixels in 

example, it may be taken to be a function of standard the neighborhood have been processed. If yes, Step 83 is 

deviation (of the given pixel), o. In a particular exemplary complete, and the label of P/ remains as it was; if not, the 

embodiment, d=3o; in another embodiment, <H£o\ where K 30 process proceeds to Stcp 834, where the next neighborhood 

is chosen by the user. As another example, 8 may be pixel P m is considered Step 835 then tests to determine 

assigned a predetermined value (again, for each pixel) or one whetberor not P, matches V m . This matching test is accom- 

chosen by the user. pushed by executing the labeling step (Step 7 or 7) in a 

If A£8, then the pixel value is considered to match the modified fashion, using P, as the pixel under consideration 

background statistical model. In this case, the pixel is and P w as the "corresponding" background statistical model 

labeled as background (BG) in Step 73, and the algorithm point. If the labeling stq> returns a label of FG or DFG, there 

proceeds back to Step 71. Otherwise, if A>8, then the pixel is no match, whereas if it returns a label of BG, there is a 

value is considered not to match the background statistical match* If there is no match, the process loops back to Step 

model, and the pixel is labeled as foreground (FG) in Step 833; if there is a match, then this is an indication that P, 

74. Again, the algorithm then proceeds back to Step 71. If AQ might be mislabeled, and the process continues to Step 836. 

Step 71 determines that all of the pixels (in the frame) have in Step 836, a neighborhood, comprising the pixels, PV 

been processed, then Step 7 is finished. surrounding P, in the frame, is selected, and an analogous 

HG. Jb depicts a three decision level method, labeled T. process is performed. That is, in Step 833, it is determined 

In FIG. 2b, the process once again begins with Step 71, a whether or not all of the pixels, P, in the neighborhood have 

step of determining whether or not ail pixels have yet been 45 yet been considerecL If yes, then Step 83 is complete, and the 

processed. If not, the process considers the next pixel to be label of P, remains as h was; if not, then the process proceeds 

processed and executes Stcp 72, the step of detennining to Step 838, where the next neighborhood pixel, F„ is 

whether or not the pixel bang processed matches die considered. Step 839 tests to determine if P w matches iF,; 

background statistical model; this is done in the same way tins is performed analogously to Step 833, with the r, under 

asmnG.2fl.Ifyes,thcathepixelislabdcdasBG(Step 50 consideration being used as to pixel being considered and 

73), and the process loops back to Step 71. If not, then the P m as ha '^corresponding" background statistical model 
process proceeds to Step 75; this is where the process of point. If it does not, then to process loops back to Step 837; 

FIG. 2b is distinguished from that of FIG. 2a if it does, then P, is relabeled as BG, and Step 83 is complete. 

In Step 75, the process determines whether or not the Returning to FIG. 3a, following Step 83, Step 84 is 

pixel under consideration Is far from matching the back- 55 executed, in which morphological erosions and diladons are 

ground statistical model. This is accomplished via a thresh- performed. First, a predetermined number, n, of erosions are 

old test similar to Step 72, only in Step 75, 8 is given a larger performed to remove incorrectly labeled foreground Note 

value. As in Step 72, 0 may be user-assigned or prcdeter* that pixels labeled DFG may not be eroded because they 

mined. In one embodiment, 8=Na, where N is a cither a represent either a pixel that is almost certainly foreground, 

predetermined or user-set number, N>K. hi another embodi* 60 This is followed by n dilations, which restore the pixels that 

ment, N«6, were correctly labeled as foreground but were eroded. 

If the result of Step 75 is that 8, then the pixel is Finally, a second predetermined number, m, of dilations are 
labeled as FG (Step 74). If not, then the pixel is labeled performed to fill in holes in foreground objects. The erosions 
definite foreground (DFG), in Stcp 76. In each case, the and dilations may be performed using conventional erosion 
process loops back to Step 71. Once Step 71 determines that 65 and dilation techniques, applied in accordance with user- 
ail pixels in the frame have been processed, Step 7 is specified parameters, and modified, as discussed above, such 
complete. that pixels labeled DFG are not eroded. 
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In alternative emboolments, Step 84 may comprise filter- its standard deviation) is reasonably small, In an embodi- 
ing techniques other than or in addition to morphological mem of the present invention* Step 313 determines this by 
erosions and dilations. In general Step 84 may employ any comparing the standard deviation with a user-defined thresh- 
forro or forms of spatial and/or temporal filtering- old parameter, if the standard deviation is less than this 

Returning to FIG. 1, following Step 8, the algorithm s threshold, then the statistical background model (for that 
returns to Step 6, to determine whether or not all frames have pixel) is determined to be stable, 
been processed. If yes, then the processing of the frame ^s to the flow of Step 31, in FIG. S» if the background 
sequence is complete, and the process ends (Step 9). statistical model is determined to be mature (Step 312), it is 

This two-pass embodiment has the advantage of relative determined whether or not the background statistical model 
simplicity, and it is an acceptable approach for ar#ticatiohs io m ^ (step 313). If either of these tests (Steps 312 and 
not requiring immediate or low-latency processing. 313) fails, the process proceeds to Step 315, in which the 
Examples of such applications include off-line video com- background statistical model of the pixel being processed is 
pression and non-linear video editing and forensic process- updated using the current value of that pixel. Step 315 will 
Ing of security and surveillance video. On the other hand, . tie explained further below. 

many other applications such as video security and surveii- is If the background statistical model is determined to be 
lance in which timely event reporting is critical do have such ^ mtm > ^ ( | a steps 312 and 313), the process 
requirements, and the embodiments to be discussed below proceeds to Step 314, where it is determined whether or not 
are tailored to address these requirements. the pixel being processed matches the background staustical 

2. Second Embodiment-^One-Pass Segmentation ^ model If yes, then the background statistical model is 

FIG. 4 depicts a flowchart of a one-pass segmentation ydatednsing ^S^&^&^^mI 
process, according to a second embodiment ofthe invention. the process loops back to Step 311 to determine if all pixels 
Comparkg HG. 4 with FIG. 1 (the ftrst embodimemx fce to the tame ta* bem processed, 
second embodiment differs in that there is only a single pass Step 314 operates by detenniaing whether or not the 
of processing for each frame sequence. This single pass, as # current pixel value is within some range of the mean value 
shown in Steps 2, 3, 31, 32, 8 in FIG. 4, incorporates the ofthe pixel, according to the current background statistical 
processes of the second pass (Steps 5-8 in FIG- 1) with the model to one embodiment of tine invention, the range is a 
first pass (Steps 2~4m FK}. 1), albeit in a modified form, as user-defined range. In yet another embodiment, it is <fefer- 
wili be discussed below. mined to be a user-defined number of standard deviations; 

As in the case of the first embodiment, the second 30 i.e., the pixel value, x, matches the background statistical 
embodiment (one-pass process), shown in FIG, 4, begins by model if 

obtaining a frame sequence (Step 1). As in the nrstembodi- ^ 
ment, the process ftenperfcnns a test to determine whether ^ w 

or not an ofthe frames have yet been processed (Step 2). wbejft £ is the user-deflned number of standard deviations, 
Also as in the first emrxKlirrm 35 0; - Is mc c^rent pixel value; and x^ is the mean 

next frame to be processed is aligned with the scene model valueof the current pixel in the background statistical 
(Step 3), As discussed above, the scene model component of model ^ c f performing Step 314 is to ensure, to 

the background model is built and updated as part of Step 3, ^ extent possible, xhat only background pixels are used to 
so there is always at least a detenitiriisticaUy-detennined and update the background statistical model 

value in the background model at each location 40 . ci^H^i mod^i i« mutated 

At this point, the process includes a step of building a * ^ 
backgjound statistical model (Step 31). litis diflfers 80m * ^ embodiment, the bac^round 5 tetis ^^V C0 ^ 
c^w.T^kT!^ *&s of the mean and standard deviation ofthe values for 

^^^L^tT^^^X^^^ each pixel (ov*r the sequence of frames). These are corrv 
1^ process r^m^ poted according to Earn. (2) and (4a) above, 

all pixels m the frame being processed have been processed 45 p ~* ~* ' w , ' _ Cf _ - if 

(Step 311). If not, then Reprocess detcmines whether or not Following Step 315, the process loops back to Step 3U, 
the background statistical model is "nature" (Step 312) and to oetermine if all pixels (in the curremframe) l^been 
"stableVStep 313) processed Once all of the pixels have been processed, the 

The reason for Steps 312 and 313 is that, initially, the rSl^orif^St^ 
statistical tedcgroundmodel will not be sufficiently devei- 50 moddls m finahzflUoD 

S ^r^^^T^skmas to fce nature of pixels. 50 ing to each pixel its current mean va^ue aid standard 
tKvSS S^me number of frames should b^ f^tion(ie., thei^tof proc^ all ofthe frames up 
cessed before pixels are labeled (i.e. the background statis- to that point). 

tical model should be "mature"); in one embodiment ofthe Note that it is possible for the background statistical 
present invention, this is a user-deflned parameter. This may 55 mock! for a given pixel never to stabilize. This generally 
be implemented as a look-ahead" procedure, in which a indicates that the particular pixel is not a background pixel 
limited number of frames are used to accumulate the bade- in the sequence of frames, and there is, thaefore, no need to 
ground statistical model prior to pixel labeling (Step 32 in assign it a value for the purposes of the background statis- 
FlG 4) tical model Noting that, as discussed above, a scene model 

While simply processing a user-delmed number of riames *o « also built and updated, there is always at least ad** 
may suffice to provide a mature statistical model, stability is fntatstically-determuied value associated with each pixel in 
a second concern (Step 313), and it depends upon the background model 

standard deviation of the background statistical model In Following Step 316, the process goes to Step 32, as shown 
particular, as will be discussed below, the statistical back- in FIG. 4, where the pixels in the frame are labeled accord- 
ground model includes a standard deviation for each pixel «s ing to their type (i.e., definite foreground, foreground or 
The statistical model (for a particular pixel) is defined as background). Step 32 is shown in further detail in the 
having become "stable" when its variance (or, equivalent!* flowchart of FIGS. 60 and 6*. 
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Hie following concepts are embodied in the description of Step 326, on the olber hand, dctennines that there is a match, 

Step 32 to follow. Ideally, labeling would always be done by the process continues to Step 327. 

testing each pixel against its corresponding point in the Step 327 determines whether or not the current pixel 

background statistical model but this is not always possible. matches the background statistical model, litis step is per- 

If the background statistical model is not ready to use on the 5 formed in the same manner as Step 314 of FIG. 5, discussed 

basis of number of frames processed (i.e., "mature"), then above. If the current pixel does match (which, as discussed 

the process must fall back on testing against the correspond- above, is determined by comparing it to die mean value 

mg point in the scene model. If the background statistical corresponding to die current pixel), the pixel is labeled BG 

model is ready to use but has not yet settled down (i.e. r is not (following connector A) in Step 329 of FIG. 66, and the 

"stable"), this is a sign that the pixel is varying and should to process then loops back (via connector D) to Step 321; If 

be labeled as being foreground. If the background statistical not, then further testing is performed in Step 328. 

model has, for some reason (i.e., because it fails to match the Step 328 determines whether, given that the current pixel 

scene model or because tt has become unsettled again), value docs not reflect a BG pixel, it reflects a FG pixel or a 

become unusable, the process must once again fell back on DFG pixel This is done by detennining if the pixel value is 

testing against the scene model is far from TT^tr h ™g the background statistical model. As 

As shown in FIG. 6a, Step 32 begins with Step 321, where discussed above, a FG pixel is distinguished from a BG pixel 

it is determined whether or not all pixels (in the current (in Step 325) by determining if its value differs from the 
frame) have been processed. If yes, Step 32 is complete; if mean by more than a particular amount, for example, a 

not, the next pixel is processed in Steps 322 et seq. number of standard deviations (see Eqn. (7)). Step 328 

Step 322 determines whether or not the background 20 applies the same test, but using a larger range. Again, the 
statistical model is mature. This is done in the same manner threshold may set as a predetermined parameter, as a com- 
as in Step 312 of FIG, 5, discussed above. If not, the process puted parameter, or as a user-defined parameter, and it may 
proceeds to Step 323, where it is determined whether or not be given in terms of a number of standard deviations from 
the pixel matches the background chromatic data of (he the mean, Le., 

corresponding point of the scene model. 25 . ^ 

Step 323 is performed by carrying out a test to o^termine 

whether or not the given pixel fells within some range of the ^ m Nisa number greater than K of Eqn. (7). If the pixel 

background chromatic data value. This is analogous to Step ues outside the range defined, for example, by Eqn . 

314 of FIG. 5, substituting me background chromatic data ^ lt & ^beled DFG (following connector C) in Step 3211 

value for the statistical mean. The threshold may be deter- 30 0 f pi£}. 66, and the process loops back (via connector D) to 

mined in a similar fashion (predetermined, user-determined, Stcp 321. If it lies within the range, the pixel is labeled FG 

or the like) (following connector B) in Step 3210 of FIG. 6b t and the 

If Step 323 determines that the pixel does match the process proceeds (via connector D) to Step 321. 

background chmmatk : data^ then ^ pixel b labeled BG After Step 32 is complete, the process proceeds to Step 8, 

(Mowing connector A) in Step 329 qtFlG.tb. From Step 35 ^ shown in FIG. 4, where spatialAemporal filtering is 

32 ?r P^fl 100 ^ (V £ C ° am f!f ' D) * Step t 3 l L performed on the pixels in the frame. Step 8 is implemented, 

If Step 323 determines that the pixel does not match the ^tim embodiment of the invention, in the same manner in 

background chromatic data, then the pixel is labeled FG which it is implemented for the two-pass embodiment, 

(following connector B) in Step 3210 of FIG. $b. From the ^ ^ fa algorithm of FIGS. 6a and 66 

Step 3210, the process loops back (via connector D) to Step ao is used for Steps 833 and 837 of Step 83 (as opposed to the 

. pixel labeling algorithms used in the two-pass embodiment) , 

If Step 322 determines that the background statistical FoItowtagStep 8, the process loops back to Step 2, where, 

model is mature, processing proceeds to Step 324. which ^ ^ fee* processed, the process ends. 

fSSTt. W ^ C l^ 0t ^^f^ mmsC81 model A single-pass approach, like the one present here, has the 

t2fft9Sf Tl"™ " 45 aaVamaie otrnx^ng a second pass, thus, reducing the 

M ^ P T C ^ S latencTassociated witTthe process, this is useful for 

poceeds to Step 325 where it is determined if the back- &pp ^ QUS m which ^ lat^cies would be detrimental, 

^imdstatigtical model waa ever stable (i.e., if it was once ^ examplei ^ teleconfaencing, webcasting, real-time 

stable but is now unsuible). If yes, then the recess branches ganu Clnd the like, 

to Step 323, and the process proceeds from mere as 50 6 ^ 

described above. If no, the pixel is labeled DFG (following 3. Third Embodiment— Modified One-Pass Segmentation 

connector C) in Step 3211 of FIG. 6b t after which the While the one-pass approach described above has a lower 

process loops back (via connector D) to Step 321. latency than the two-pass approach, it does have a disad- 

If Step 324 determines that the background statistical vantage in regard to the background statistical model. In 

model is stable, the process goes to Step 326, Step 326 tests 55 particular, the cumulative statistical modeling approach used 

whether the background statistical model matches the back- in the one-pass embodiment of the invention may stabilize 

ground chromatic data. Similar to the previous matching on a non-representative statistical model for an element (i.e., 

tests above, this test takes an absolute difference between the pixel, region, etc.; that is, whatever size element is under 

value of the background statistical model (i.e., the mean) for consideration). If the values (e,g., chromatic values) of 

the pixel and the background chromatic data (Le., of the 60 frame elements corresponding to a particular element of the 

scene model) for the pixel. This absolute difference is then video scene fundamentally change (i.e., something happens 

compared to some threshold value, as above (redetermined, to change the video, for example, a parked car driving away, 

user-determined, or the like). a moving car parking, the lighting changes, etc.), then the 

If Step 326 determines that there is not a match between scene model element will no longer accurately represent the 

the background statistical model and the background chro- <ss true scene. This can be addressed by utilizing a mechanism 

matic data, the process branches to Step 323, where pro- for dynamically updating the background statistical model 

cessing proceeds in the same fashion as described above. If so mat at any given time it accurately represents the true 
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nature of the scene depicted in the video. Such a mechanism background images. It does this while still maintaining 

is depicted in the embodiment of the invention shown in improved latency time over Use two-pass embodiment, and 

KIO. 7. at only a negligible decrease in processing speed compared 

In FIG. 7, Steps 13, 32, 8, and 9 are as described in the with the one-pass embodiment. 
SESftTS^^ ' Additional Embodiments and Remarks 

^ Z Jl?£i. J£ JUL JZ£* ™,1 While the abovediseussion considers two-level and three- 

cTf.n • th l^!^l (S ^i\^J?T $Se t? aX A level pixel labeling algorithms, this embodiment is tot 

teZmmFlQ Step 310 includes all of the steps ca% fcz^ or «,ft-decision logic wouM be used tomato 

reference nu^sXanditU^wiftastepof^em^ ^^f^^ZT^^^^S 

whether or not all pixels haveyet been proceed (St^31^ »s ■»* ^ < w ^^^»J^^^^i 

Ifnot, thenextpUel is processed by p^codmg to Step 312 %™*> SL^S^ '^^ZtSS 

in Step 312, it is determined whether or not the background these quantn^s. ^^^^^ * "f 4 and 

statistical model is mature. If not. the process branches to W*™** *™ va ' ues „, ^ 

Step 315, where the pixel is used to update the background As discussed above tte mventtra. mctadmg all of the 

statistical model. Following Step 315. the process loops 20 embodummts discussed m the p^ng^os, ^ybc 

back to Step 31 1 embodied tn the form of a computer system or m the fonn 

If Step 312 determines that the background statistical of a compensable medium containing software impJe- 

modd is mature, the process proceeds toStep 313. wtareit ««««« invention. 1ms ^depicted «■ FIG. % jrtN* 

is deteimined whether or not the bacteround statistical shows a plan vtcw for a computer system for the myentionL 

model is stable If it is not, then, as in the case of a negative 25 I* 6 computer 91 includes a computw-readable medium W 

determination in Step 312, the process branches to Step 315 embodying software for mmlementtng the invention andfor 

^g^^t&zsz&iix 

to Step 314, it is detennined whether or cot the pixel segmental vide^j as shown. Alter^Uvely, thes^mected 

under consideration matches the background statistical 30 video may be farther processed within me computer, 
model. If it does, the process proceeds with Step 315 (and A* 80 88 discussed aoovc » *** statistical pixel modeling 

&entoopsbacktoStep3Il);ou^^ methods described above may be h*xnpoiated mto a method 

the steps shown in FIG. 86. which build and update a of impiementmg an intelligent video sorveillanc^s^nx 
secondary background statistical model. This secondary depicts an embodiment of such a method, in 

background statistical model is built in parallel with the is particular, block 1001 represents the use of statistical pixel 

background statistical model as reflected in FIG. 86; uses modeling, e,g., as described above. Onre the statistical pixel 

the same procedures as are used to build and update the modeling has been completed, block 1002 uses the results to 

background statistical modeU and r^resents the pixel values identify *** classify objects. Block 1002 may use, for 

that do not match the background statistical modeL example, statistical or template-oriented methods for per- 

Following a negative deterrnination in Step 314, &e 40 forming such idratifkation and c&ssincanon> m perlbnnmg 

process men makes a deteraunatkm as to whether or not the identification and classification, it is determined whether or 

secondary background statistical model is mature (Step not a given object is an object of interest; for example, one 

3107). This toemiination is made in the same fkshion as in maybe interested in tracking the movements of people 

Step 313, If not, the process branches to Step 3109, where through an area under ^f^.^h would make 

the secondary background statistical model is updated, using 4* P«>Pk ^J** 8 of mterest In Block 1003 behaviors of 

the same procedures as for the background staUstkal model objects of interest are analyzed; for example, it may be 

(Step 315). From Step 3109, the process loops back to Step detennined if a person has entered a restricted area. Finally, 

311 (in FIG, 8a). in Block 1004, if desired, various notificataons may be sent 

If Step 3107 determines that the secondary background out or otber appropriate actions taken, 
statistical model is mature, the process proceeds to Step so The invention has been described in detail with respect to 

3108, which determines (using the same procedures as in preferred emtodiments, and it will now be apparent from the 

Step 314) whether or not the secondary background static foregoing to those skilled in the art mat changes and 

tical model is stable. If not, the process proceeds to Step modifications may be made without departing from the 

3109 (and ftom there to Step 311), If yes, then the process invention in its broader aspects. The invention, therefore, as 

branches to Step 31010, in which the background statistical ss defined in the appended claims, is intepded to coverall such 

model is replaced with the secondary background statistical changes and modifications as 611 withmthe true spirit of the 

model, after which the process loops back to Step 311. invention. 
Admuoiiahy.concurr^ . 
ground statistical model by the secondary background sta- we claim: 

tistical model in Step 31010, the scene model data is <o , l A mcthod of implementing an intelligent video surveil- 

repJaced with the mean value of the secondary statistical "nee system, comprising: 

model. At this point, the secondary background statistical obtamm € a frame ™ m £* ^*?™ ; 

model is reset to zero, and a new one will be built using executing a first-pass method for each frame of the frame 
subsequent data. sequence, the first-pass method comprising the steps of: 

This modified one-pass embodiment has the advantage of « aligning the frame with a scene model; and 
improved statistical accuracy over the one-pass embodi- updating a background statistical model; 

ment, and it solves the potential problem of changing finalizing the background statistical model; 
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executing a second-pass method for each frame of the labeling the regions of the tame; and 

frame sequence, the second-pass method comprising performing srmtial/tcmporal filtering; 

the steps of: identifying and classifying objects based on the results of 

labeling each region of the frame; and the labeling and filtering; and 

performing spatial/temporal filtering of the regions of 5 analysing behaviors of at least one object, 
the frame; 18. A computer-readable medium comprising software 

identifying and classifying objects using the labeled and implementing the method of claim 17. 
filtered regions; and analyzing behaviors of at least one 19. An intelligent video surveillance system comprising a 
of the objects. computer system comprising: 

2. A computer-readable medium comprising software 10 a computer; and 
implementing the method of claim 1. a computer-readable medium according to claim 18. 

1 An intelligent video surveillance system comprising a 20. The method of claim 17, wherein said analyzing 
computer system comprising: behaviors of at least one of the objects comprises: 

a computer; and tracking at least one of the objects, 

a computer-readable medium according to claim 2. i* 21. The method of claim 17, further comprising: 

4. The method of claim 1, wherein said analyzing behav- creating at least one rule to detect at least one specific 
iors of at least one of the objects comprises: activity; 

tracking at least one of the objects. wherein said analyzing behaviors of at least one of the 

5. Hie method of claim 1, further comprising: objects includes applying the at least one rule, 
creating at least one rule to detect at least one specific 20 22. The method of claim 21, wherein said at least one rule 

activity; includes at least one virtual tripwire and deter minin g when 

wherein said analyzing behaviors of at least one of the the at feast one virtual tripwire is crossed, 
objects includes applying the at feast erne rule. 23. The method of claim 21, wherein said at least one rule 

6. The method of claim 5, wherein said at least one rule includes a definition of at least one area and the deter mining 
includes at least one virtual tripwire and determining when 23 at least one of when an object enters, when an object leaves, 
the at feast one virtual tripwire is crossed. and when an object loiters in the at least one area. 

7. The method of claim 5, wherein said at least one rule 24. The method of claim 21, wherein said at least one rule 
includes a definition of at least one area and the detenmning includes at feast one of determining when an object is added 
at least one of when an object enters, when an object leaves, to a scene and deterrninmg when an object is removed from 
and when an object loiters in the at least one area. 30 a scene. 

8. The method of claim 5, wherein said at least one rule 25. A method of implementing an automated closed* 
includes at least one ofdetermirring when an object is added circuit television (CCTV) surveillance system, comprising: 
to a scene and determining when an obj ect is removed from providing CCTV equipment generating an input video 
a scene. stream; and 

9. A method of implementing an automated desed-drcuit 35 implementing the method of claim 17. 

television (CCTV) surveillance system, comprising: 26. A method of implementing an automated security 

providing CCTV equipment generating an input video system, comprising the method of claim 17. 

stream; and 27. A method of implementing an automated anti-terror- 

implementing the method of claim I . ism system, comprising the method of claim 17. 

10. A method of implementing an automated security 40 28. A method of iniplernenting an automated market 
system, comprising the method of claim 1. research system, comprising the method of claim 17. 

11. A method of implementing an automated ami-terror- 29. The method of claim 28, wherein said analyzing 
ism system, comprising the method of claim 1. behaviors of at least one of the objects comprises; 

12. A method of implementing an automated market tracking behaviors of at least one of the objects in at least 
research system, comprising the method of claim 1. 43 aneretail location. 

13. The method of claim 12, wherein said analyzing 30. A method of implementing an automated traffic moni- 
^22? "il? ™ f?!^f objec^ conmn^s: toring system, comprising the method of claim 17. 

tracking ^y^rsofat least one of the objects in at least ^ of claim ^ wherem ^ analyzing 

ia°T ISIL r- *L. n ^ behaviors of at least one of the objects comprises at least one 

14. A method of miplemennng an automated traffic mom- so f . 

toring s^em^prising the method of claim 1. de^m wrong-way traffic; 

15. The method of claim 14, wherein said analyzmg a br^own^ncle, 
behaviors of at least one of the objects comprises at feast one " " ' 
of . detecting an accident; and 

detecting wrong-way traffic; 53 detecting a road blockage. 

detecting a broken-down vehicle; 32 - A n**** of implementing a video compression 

detecting an accident; and system, comprising the method of claim 17. 

detecting a road blockage. 33. A method of implementing a video compression 

16. A method of implementing a video compression system, comprising the method of claim 17. 

system comprising the method of daim 1 . go 34. A method of unplementing an intelligent video sur- 

17. A method of implementing an intelligent video sur» vdliance system, comprising: 

veillance system, comprising: obtaining a frame sequence from a video stream; 

obtaining a frame sequence from a video stream; for each frame in the frame sequence, performing the 

for each frame in the frame sequence, reforming the following steps: 

following steps: <ss aligning the frame with a scene model; 

aligning the frame with a scene model; building a background statistical model and a second- 

building a background statistical model; ary statistical model; 
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labeling the regions of the frame; and 

performing spatial/temporal filtering; 
identifying and classifying objects based on the results of 

the labeling and filtering; and 
analyzing behaviors of at least one object* 5 

35. A computer-readable medium comprising software 
implementing the method of claim 34. 

36. An intelligent video surveillance system comprising a 
computer system comprising: 

a computer; and jo 
a computer-readable medium according to claim 35. 

37. The method of claim 34, wherein said analyzing 
behaviors of at least one of the objects comprises: 

tracking at least one of the objects. 

The method of claim 34, further comprising: is 
creating at least one rule to detect at least one specific 
activity; 

wherein said analyzing behaviors of at least one of the 
objects includes applying the at least one role. 

39. The method of claim 38, wherein said at least one rule 20 
includes at least one virtual tripwire and determining when 
the at least one virtual tripwire is crossed. 

40. The method of claim 38, wherein said at least one rule 
includes a definition of at least one area and the determining 

at least one of when an object enters, when an object leaves, 25 
and when an object loiters in the at least one area. 

41. The method of claim 38, wherein said at least one rule 
includes at least one of determining when an object is added 
to a scene and determining when an object is removed from 

a seme. 30 

42. A method of implementing an automated closed* 
circuit television (CCTV) surveillance system, comprising: 

providing CCTV equipment generating an input video 
stream; and 

implementing die method of claim 34. 33 

43. A method of implementing an automated security 
system, comprising the method of claim 34. 

44. A method of implementing an automated anti-terror- 
ism system, comprising the method of claim 34. 

45. A method of implementing an automated market 40 
research system, comprising the method of claim 34. 

46. The method of claim 45, wherein said analysing 
behaviors of at least one of the objects comprises: 

tracking behaviors of at least one of the objects in at least 
one retail location, 45 

47. A method of implementing an automated traffic moni- 
toring system, comprising the method of claim 34. 

48. The method of claim 47, wherein said analyzing 
behaviors of at least one of the objects comprises at least one 
of: 



detecting wrong-way traffic; 
detecting a broken-down vehicle; 
detecting an accident; and 
detecting a road blockage. 

49. A method of implementing a video compression 
system, comprising the method of claim 34. 

50. An apparatus for intelligent video surveillance 
adapted to perform the method comprising: 

obtaining a frame sequence from an input video stream; 

executing a first-pass method for each frame of the frame 
sequence, the first-pass method comprising the steps of: 
aligning the frame with a scene model; and 
updating a background statistical model; 

finalizing the background statistical model; 

executing a second-pass method for each frame of the 
frame sequence, the second-pass method comprising 
the steps of: 

labeling each region of the frame; and 
performing sr*tial/temporal filtering of the regions of 
the frame; 

identifying and classifying objects using the labeled and 

filtered regions; and 
analyzing behaviors of at least one of the objects. 

51. The apparatus of claim 50 wherein the apparatus 
comprises application-specific hardware to emulate a com- 
puter and/or software adapted to perform said obtaining, 
said executing a first-path method, said finalizing, said 
executing a second-path method, said identifying, and said 



52. An apparatus fbr intelligent video surveillance 
adapted to perform the method comprising: 

obtaining a frame sequence from a video stream; 
for each frame in the frame sequence, performing the 
following steps: 

aligning the frame with a scene model; 

building a background statistical model; 

labeling the regions of the frame; and 

performing spatial/temporal tutoring; 
identifying and classifying objects based on the results of 

the labeling and filtering; and 
analyzing behaviors of at least one object. 

53, The apparatus of claim 52 wherein the apparatus 
comprises application-specific hardware to emulate a com- 
puter and/or software adapted to perform said obtaining, 
said aligning, said building, said labeling, said filtering, said 
identifying, and said analyzing. 



